Cross-Linguistic Knowledge Induction from Parallel Corpora
نویسنده
چکیده
Parallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the meaning of the source text can be traced and can bring evidence on linguistic facts which in a monolingual context might be overlooked by a computer program. When linguistic annotations are available or easy to produce for one or more languages in a parallel corpus, but not for all, inductive learning methods provide a powerful support for systematic and consistent cross-lingual transfer of the linguistic interpretations and allow for focused comparative studies for the languages of the parallel corpus.
منابع مشابه
Parallel Corpora, Alignment Technologies and Further Prospects in Multilingual Resources and Technology Infrastructure
Multilingual technologies, which to a large extent are language independent, provide a powerful support for easier building of annotated linguistic resources for languages where such resources are scarce or missing. All these technologies require parallel corpora in order to achieve their ends. Parallel texts encode extremely valuable linguistic knowledge because the linguistic decisions made b...
متن کاملAutomatic transfer rule induction from parallel corpora
Recently, many projects have been proposed aiming at automatically transforming the multilingual information available on parallel texts into linguistic knowledge useful for machine translation. This paper describes an ongoing PhD project in which the main goal is to automatically induce transfer rules and bilingual dictionaries from part-of-speech tagged and lexically aligned parallel corpora....
متن کاملMultilingual Document Alignment - A Study with Chinese and Japanese
Natural language processing (NLP) community is increasingly using paralleland comparablecorpora for cross-linguistic research. The knowledge extracted from such corpora helps us in cross-language information retrieval, topic detection and tracking, machine translation, and many other NLP tasks. Parallel or comparable corpora of JapaneseChinese language-pair are rare. We investigate an automatic...
متن کاملParallel Chinese-English Entities, Relations and Events Corpora
This paper introduces the parallel Chinese-English Entities, Relations and Events (ERE) corpora developed by Linguistic Data Consortium under the DARPA Deep Exploration and Filtering of Text (DEFT) Program. Original Chinese newswire and discussion forum documents are annotated for two versions of the ERE task. The texts are manually translated into English and then annotated for the same ERE ta...
متن کاملCross-lingual Propagation for Morphological Analysis
Multilingual parallel text corpora provide a powerful means for propagating linguistic knowledge across languages. We present a model which jointly learns linguistic structure for each language while inducing links between them. Our model supports fully symmetrical knowledge transfer, utilizing any combination of supervised and unsupervised data across language barriers. The proposed non-parame...
متن کامل